Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Apr 16, 2025

When Image counts get absurdly large (hundreds of thousands), pruning can fail with errors like:

Error from server (Expired): The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list.

With this commit, I'm still printing that warning message, but if we got any Images back before the error, I'm still going to try and prune those, because maybe some progress in pruning will get the Image count down low enough that a future invocation will go through cleanly.

@openshift-ci openshift-ci bot requested review from atiratree and ingvagabund April 16, 2025 23:23
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
Once this PR has been reviewed and has the lgtm label, please assign ardaguclu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking wking force-pushed the allow-partial-image-retrieval-when-pruning branch 2 times, most recently from 98d0b45 to fdb30bd Compare April 16, 2025 23:30
When Image counts get absurdly large (hundreds of thousands), pruning
can fail with errors like:

  Error from server (Expired): The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list.

With this commit, I'm still printing that warning message, but if we
got any Images back before the error, I'm still going to try and prune
those, because maybe some progress in pruning will get the Image count
down low enough that a future invocation will go through cleanly.
@wking wking force-pushed the allow-partial-image-retrieval-when-pruning branch from fdb30bd to 6097a9b Compare April 17, 2025 00:04
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 6097a9b link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6 6097a9b link false /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws-ovn-serial-1of2 6097a9b link true /test e2e-aws-ovn-serial-1of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2025
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 14, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2025

Walkthrough

The PR modifies error handling in the images pruning command by introducing an acceptableErrors collection to accumulate errors during image listing. When listing errors occur alongside successful image collection, warnings are emitted instead of failing immediately.

Changes

Cohort / File(s) Change Summary
Images pruning error handling
pkg/cli/admin/prune/images/images.go
Introduces acceptableErrors collection; modifies logic to emit warnings to ErrOut when image listing fails after partial collection; aggregates acceptableErrors into final error set after summary printing

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Subtle error handling logic requires careful reasoning around when to emit warnings vs. return errors immediately
  • Understanding the control flow of error aggregation and its interaction with partial collection results
  • Verification that the acceptableErrors aggregation correctly handles edge cases and error scenarios
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/cli/admin/prune/images/images.go (1)

425-430: Consider refining the warning message tone.

The error handling logic is sound and aligns with the PR objective to allow partial progress. However, the warning message uses informal phrasing ("we got %d, so keep going").

Consider this slightly more formal alternative:

-			fmt.Fprintf(o.ErrOut, "warning: error retrieving images, but we got %d, so keep going to see if we can prune any of those: %s\n", len(allImages), err)
+			fmt.Fprintf(o.ErrOut, "warning: error retrieving images (%s), but %d images were retrieved successfully; attempting to prune those images\n", err, len(allImages))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between beeaf09 and 6097a9b.

📒 Files selected for processing (1)
  • pkg/cli/admin/prune/images/images.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/cli/admin/prune/images/images.go
🔇 Additional comments (1)
pkg/cli/admin/prune/images/images.go (1)

416-416: LGTM!

The acceptableErrors slice declaration is appropriate for accumulating non-fatal errors during image listing.

Comment on lines +537 to +539
if errs == nil {
errs = kutilerrors.NewAggregate(acceptableErrors)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Aggregate both error sets to preserve all error context.

The current logic only returns acceptableErrors when pruning succeeds (errs == nil). If pruning fails, the acceptableErrors from image listing are lost, hiding important diagnostic information from users.

Apply this diff to aggregate both error sets:

 	fmt.Fprintf(o.Out, "Summary: %s\n", stats)
-	if errs == nil {
-		errs = kutilerrors.NewAggregate(acceptableErrors)
+	if len(acceptableErrors) > 0 {
+		allErrors := append(acceptableErrors, errs)
+		errs = kutilerrors.NewAggregate(allErrors)
 	}
 	return errs

Note: kutilerrors.NewAggregate handles nil errors gracefully, so this preserves existing behavior while ensuring acceptableErrors are never lost.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if errs == nil {
errs = kutilerrors.NewAggregate(acceptableErrors)
}
if errs == nil {
errs = kutilerrors.NewAggregate(acceptableErrors)
} else if len(acceptableErrors) > 0 {
allErrors := make([]error, 0, len(acceptableErrors)+1)
allErrors = append(allErrors, acceptableErrors...)
allErrors = append(allErrors, errs)
errs = kutilerrors.NewAggregate(allErrors)
}
🤖 Prompt for AI Agents
In pkg/cli/admin/prune/images/images.go around lines 537 to 539, the current
code replaces errs with only acceptableErrors when errs is nil, losing
acceptableErrors when errs is non-nil; modify this so both error sets are
aggregated into a single error (using kutilerrors.NewAggregate) regardless of
whether errs is nil, ensuring acceptableErrors are appended/combined with errs
and assigned back to errs so no error context is lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants